Notebook

A2: NeuralNetwork Class¶

Requirements¶

In this assignment, you will complete the implementation of the NeuralNetwork class, starting with the code included in the 05 and 06 lecture notes. First, define your NeuralNetwork class to include just one hidden layer, as done in notes 05. Follow these steps:

Define the __init__ function to accept three arguments, the number of inputs in each sample (columns of X), the number of units in the hidden layer, and the number of outputs of the output layer, and
- assign these values to member variables self.n_inputs, self.n_hiddens_each_layer, and self.n_outputs,
- initialize a weight matrix for each layer and store them in a list in self.Ws, and
- initialize self.rmse_trace to an empty list.
Define the _forward(self, X) function that returns the output of the network, Y, in standardized form and create self.Zs as a list consisting of the input X and the outputs of the hidden layer.
Define the _gradients(self, X, T) function that returns the gradients of the mean square error with respect to the weights in each layer.
Define _calc_rmse_standardized as shown in notes 05.
Define the train(self, Xtrain, Ttrain, Xtest, Ttest, n_epochs, learning_rate) function that
- standardizes Xtrain and Ttrain and saves the standardization parameters (means and stds) in member variables, self.X_means, self.X_stds, self.T_means and self.T_stds,
- standardizes Xtest and Ttest using self.X_means, self.X_stds, self.T_means and self.T_stds,
- loops for n_epochs as shown in notes 05 and for each loop,
  - uses the _forward function to calculate the outputs of all units,
  - uses the _gradients function to calculate the gradient of the the mean squared error respect to all weight matrices,
  - updates all weight matrices, and
  - calculates the RMSE for train and test data and appends these values to self.rmse_trace
Define use(self, X) that
- standardizes X using the standardization member variables,
- calls _forward to calculate the outputs of all units,
- unstandardizes the outputs of the network, and
- returns these outputs.
You may choose to define other functions, such as _add_ones, to be called by the functions above. Remember to name functions with a leading _ that are not meant to be called by the users of your NeuralNetwork class.

No test your implementation. You may use the same example data as used in notes 05. When you are happy with your test results:

Copy your NeuralNetwork class code cell, and paste it after the code cells you used to test your one-hidden layer NeuralNetwork class.
Modify the code in this new cell to allow any number of hidden layers, including no hidden layers specified by an empty list as []. Don't forget this case. The constructor, __init__, must now accept a list of numbers of units in each hidden layer, rather than just a single number of units. The length of this list determines the number of hidden layers.

See the following examples for more details. Then,

Apply your NeuralNetwork class to the problem of predicting the value of concrete strength as described below.

In [1]:

import numpy as np
import matplotlib.pyplot as plt

import IPython.display as ipd  # for display and clear_output
import time

First Version of the `NeuralNetwork` Class - One Hidden Layer¶

In [2]:

# insert your NeuralNetwork class definition here.  This will be a large code cell when you are done!
class NeuralNetwork:
    
    pass

In this next code cell, I add a new method to your class that replaces the weights created in your constructor with non-random values to allow you to compare your results with mine, and to allow our grading scripts to work well.

In [ ]:

def set_weights_for_testing(self):
    for W in self.Ws[:-1]:   # leave output layer weights at zero
        n_weights = W.shape[0] * W.shape[1]
        W[:] = np.linspace(-0.01, 0.01, n_weights).reshape(W.shape)
        for u in range(W.shape[1]):
            W[:, u] += (u - W.shape[1]/2) * 0.2
    # Set output layer weights to zero
    self.Ws[-1][:] = 0
    print('Weights set for testing by calling set_weights_for_testing()')

setattr(NeuralNetwork, 'set_weights_for_testing', set_weights_for_testing)

Second Version of the `NeuralNetwork` Class - Multiple Hidden Layers¶

When your second version is working, you may delete the above code cell that defines your first version of NeuralNetwork.

In [2]:

class NeuralNetwork:
    
    pass

In [4]:

# If you first develop your `NeuralNetwork` class in a python script file, named `A2mysolution.py`, 
# you can import it here for testing.
# Before you check in your notebook, copy and paste the whole `NeuralNetwork` class definition in the
# above cell, and delete this cell.

# from A2mysolution import NeuralNetwork

Example Results¶

Here we test your new NeuralNetwork class that allows 0, 1, 2, or more hidden layers with some simple data.

In [5]:

X = np.arange(0, 10, 0.1).reshape(-1, 1)
T = np.sin(X) + 0.01 * (X ** 2)
X.shape, T.shape

Out[5]:

((100, 1), (100, 1))

In [6]:

# Collect every 5th sample as the test set.
test_rows = np.arange(0, X.shape[0], 5)
# All remaining samples are in the train set.
train_rows = np.setdiff1d(np.arange(X.shape[0]), test_rows)

Xtrain = X[train_rows, :]
Ttrain = T[train_rows, :]
Xtest = X[test_rows, :]
Ttest = T[test_rows, :]

print(f'{Xtrain.shape=} {Ttrain.shape=} {Xtest.shape=} {Ttest.shape=}')

Xtrain.shape=(80, 1) Ttrain.shape=(80, 1) Xtest.shape=(20, 1) Ttest.shape=(20, 1)

In [7]:

plt.plot(Xtrain, Ttrain, 'o', label='Train')
plt.plot(Xtest, Ttest, 'o', label='Test')
plt.legend();

In [8]:

n_inputs = X.shape[1]
n_outputs = T.shape[1]

nnet = NeuralNetwork(n_inputs, [3, 2], n_outputs)
nnet

Out[8]:

NeuralNetwork(1, [3, 2], 1)

In [9]:

nnet.n_inputs, nnet.n_hiddens_each_layer, nnet.n_outputs

Out[9]:

(1, [3, 2], 1)

In [10]:

nnet.rmse_trace

Out[10]:

[]

In [11]:

nnet.Ws

Out[11]:

[array([[ 0.67704045,  0.70371353,  0.00252495],
        [ 0.55918031, -0.38161161, -0.05879949]]),
 array([[-0.3809577 ,  0.38959034],
        [-0.01438485,  0.36708073],
        [-0.31925238,  0.49403834],
        [-0.13570525,  0.33183844]]),
 array([[0.],
        [0.],
        [0.]])]

In [12]:

nnet.set_weights_for_testing()

Weights set for testing by calling set_weights_for_testing()

In [13]:

nnet.Ws

Out[13]:

[array([[-0.31 , -0.106,  0.098],
        [-0.298, -0.094,  0.11 ]]),
 array([[-0.21      , -0.00714286],
        [-0.20428571, -0.00142857],
        [-0.19857143,  0.00428571],
        [-0.19285714,  0.01      ]]),
 array([[0.],
        [0.],
        [0.]])]

In [14]:

nnet.train(Xtrain, Ttrain, Xtest, Ttest, n_epochs=1, learning_rate=0.1)

Out[14]:

NeuralNetwork(1, [3, 2], 1)

In [15]:

nnet.Zs

Out[15]:

[array([[-1.73291748],
        [-1.55962573],
        [-1.38633399],
        [-1.21304224],
        [-1.03975049],
        [-0.86645874],
        [-0.69316699],
        [-0.51987524],
        [-0.3465835 ],
        [-0.17329175],
        [ 0.        ],
        [ 0.17329175],
        [ 0.3465835 ],
        [ 0.51987524],
        [ 0.69316699],
        [ 0.86645874],
        [ 1.03975049],
        [ 1.21304224],
        [ 1.38633399],
        [ 1.55962573]]),
 array([[ 2.03527172e-01,  5.68329348e-02, -9.23569751e-02],
        [ 1.53544458e-01,  4.05825180e-02, -7.34264441e-02],
        [ 1.02763480e-01,  2.43106038e-02, -5.44428527e-02],
        [ 5.14411404e-02,  8.02579805e-03, -3.54198229e-02],
        [-1.54354032e-04, -8.26326587e-03, -1.63710911e-02],
        [-5.17490267e-02, -2.45479456e-02,  2.68953195e-03],
        [-1.03068918e-01, -4.08196082e-02,  2.17482009e-02],
        [-1.53845874e-01, -5.70696481e-02,  4.07910762e-02],
        [-2.03823074e-01, -7.32895056e-02,  5.98043640e-02],
        [-2.52760066e-01, -8.94706847e-02,  7.87743562e-02],
        [-3.00437097e-01, -1.05604771e-01,  9.76874699e-02],
        [-3.46658600e-01, -1.21683448e-01,  1.16530286e-01],
        [-3.91255732e-01, -1.37698517e-01,  1.35289586e-01],
        [-4.34087939e-01, -1.53641907e-01,  1.53952389e-01],
        [-4.75043566e-01, -1.69505700e-01,  1.72505989e-01],
        [-5.14039585e-01, -1.85282137e-01,  1.90937983e-01],
        [-5.51020549e-01, -2.00963638e-01,  2.09236306e-01],
        [-5.85956907e-01, -2.16542814e-01,  2.27389262e-01],
        [-6.18842828e-01, -2.32012479e-01,  2.45385547e-01],
        [-6.49693682e-01, -2.47365664e-01,  2.63214279e-01]]),
 array([[-0.24026129, -0.00811343],
        [-0.23101806, -0.00792238],
        [-0.22158355, -0.00772975],
        [-0.21200657, -0.007536  ],
        [-0.20233913, -0.00734163],
        [-0.19263529, -0.00714712],
        [-0.18295003, -0.00695296],
        [-0.17333791, -0.00675965],
        [-0.16385194, -0.00656764],
        [-0.15454234, -0.00637739],
        [-0.14545559, -0.0061893 ],
        [-0.13663353, -0.00600376],
        [-0.12811273, -0.0058211 ],
        [-0.11992404, -0.00564161],
        [-0.11209237, -0.00546556],
        [-0.10463666, -0.00529315],
        [-0.09757005, -0.00512455],
        [-0.09090019, -0.00495988],
        [-0.08462964, -0.00479924],
        [-0.07875642, -0.00464269]]),
 array([[-0.00035099],
        [-0.00033749],
        [-0.00032372],
        [-0.00030973],
        [-0.00029561],
        [-0.00028144],
        [-0.00026729],
        [-0.00025326],
        [-0.0002394 ],
        [-0.00022581],
        [-0.00021254],
        [-0.00019965],
        [-0.00018721],
        [-0.00017525],
        [-0.00016381],
        [-0.00015292],
        [-0.0001426 ],
        [-0.00013286],
        [-0.0001237 ],
        [-0.00011512]])]

Why only 20 rows in these matrices? I thought I had 80 training samples!

In [16]:

print(nnet)

NeuralNetwork(1, [3, 2], 1) trained for 1 epochs with a final RMSE of None.

In [17]:

nnet.X_means, nnet.X_stds

Out[17]:

(array([5.]), array([2.88530761]))

In [18]:

nnet.T_means, nnet.T_stds

Out[18]:

(array([0.51792742]), array([0.74017845]))

In [19]:

[Z.shape for Z in nnet.Zs]

Out[19]:

[(20, 1), (20, 3), (20, 2), (20, 1)]

In [20]:

nnet.Ws

Out[20]:

[array([[-0.31 , -0.106,  0.098],
        [-0.298, -0.094,  0.11 ]]),
 array([[-0.21      , -0.00714286],
        [-0.20428571, -0.00142857],
        [-0.19857143,  0.00428571],
        [-0.19285714,  0.01      ]]),
 array([[5.46221054e-18],
        [1.45977062e-03],
        [3.30068017e-05]])]

In [21]:

dir(nnet)

Out[21]:

['T_means',
 'T_stds',
 'Ws',
 'X_means',
 'X_stds',
 'Zs',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_add_ones',
 '_calc_rmse_standardized',
 '_forward',
 '_gradients',
 'n_epochs',
 'n_hiddens_each_layer',
 'n_inputs',
 'n_outputs',
 'rmse',
 'rmse_trace',
 'set_weights_for_testing',
 'train',
 'use']

In [22]:

def plot_data_and_model(nnet, Xtrain, Ttrain, Xtest, Ttest):
    plt.clf()        
    plt.subplot(2, 1, 1)
    plt.plot(nnet.rmse_trace)
    plt.xlabel('Epoch')
    plt.ylabel('RMSE')
    plt.legend(('Train RMSE', 'Test RMSE'))
                   
    plt.subplot(2, 1, 2)
    order = np.argsort(Xtrain, axis=0).flatten()
    Xtrain = Xtrain[order]
    Ttrain = Ttrain[order]
    plt.plot(Xtrain, nnet.use(Xtrain), '-', label='Ytrain')
    plt.plot(Xtrain, Ttrain, 'o', label='Ttrain', alpha=0.5)

    order = np.argsort(Xtest, axis=0).flatten()
    Xtest = Xtest[order]
    Ttest = Ttest[order]
    plt.plot(Xtest, nnet.use(Xtest), '-', label='Ytest')
    plt.plot(Xtest, Ttest, 'o', label='Ttest', alpha=0.5)

    plt.xlabel('X')
    plt.ylabel('T or Y')
    plt.legend();

In [23]:

X = np.arange(0, 10, 0.1).reshape(-1, 1)
T = np.sin(X) + 0.01 * (X ** 2)

# Collect every 5th sample as the test set.
test_rows = np.arange(0, X.shape[0], 5)
# All remaining samples are in the train set.
train_rows = np.setdiff1d(np.arange(X.shape[0]), test_rows)

Xtrain = X[train_rows, :]
Ttrain = T[train_rows, :]
Xtest = X[test_rows, :]
Ttest = T[test_rows, :]

print(f'{Xtrain.shape=} {Ttrain.shape=} {Xtest.shape=} {Ttest.shape=}')

n_inputs = X.shape[1]
n_outputs = T.shape[1]
nnet = NeuralNetwork(n_inputs, [10, 5], n_outputs)
nnet.set_weights_for_testing()

n_epochs = 10000
n_epochs_per_plot = 200

fig = plt.figure()
for reps in range(n_epochs // n_epochs_per_plot):
    plt.clf()
    nnet.train(Xtrain, Ttrain, Xtest, Ttest, n_epochs=n_epochs_per_plot, learning_rate=0.2)
    plot_data_and_model(nnet, Xtrain, Ttrain, Xtest, Ttest)
    ipd.clear_output(wait=True)
    ipd.display(fig)
ipd.clear_output(wait=True)

In [28]:

X = np.arange(-2, 2, 0.02).reshape(-1, 1)
T = np.sin(X) * np.sin(X * 10)

rows = np.arange(X.shape[0])
np.random.shuffle(rows)
ntrain = int(len(rows) * 0.7)

Xtrain = X[rows[:ntrain], :]
Ttrain = T[rows[:ntrain], :]
Xtest = X[rows[ntrain:], :]
Ttest = T[rows[ntrain:], :]

print(f'{Xtrain.shape=} {Ttrain.shape=} {Xtest.shape=} {Ttest.shape=}')

n_inputs = X.shape[1]
n_outputs = T.shape[1]
nnet = NeuralNetwork(n_inputs, [50, 10, 5], n_outputs)
nnet.set_weights_for_testing()

n_epochs = 80000
n_epochs_per_plot = 1000

fig = plt.figure(figsize=(10, 8))
for reps in range(n_epochs // n_epochs_per_plot):
    plt.clf()
    nnet.train(Xtrain, Ttrain, Xtest, Ttest, n_epochs=n_epochs_per_plot, learning_rate=0.05)
    plot_data_and_model(nnet, Xtrain, Ttrain, Xtest, Ttest)
    ipd.clear_output(wait=True)
    ipd.display(fig)
ipd.clear_output(wait=True)

Your results will not be the same, but your code should complete and make plots somewhat similar to these.

Application of `NeuralNetwork` class to some concrete data!¶

Download data from Calculate Concrete Strength at Kaggle. Read it into python using the pandas.read_csv function. Assign the first 8 columns as inputs to X and the final column as target values to T. Make sure T is two-dimensional.

In [30]:

import pandas

# Read the csv file as a pandas.DataFrame
df = pandas.read_csv('concrete_data.csv')

Xd = df.iloc[:, range(8)]
X_names = Xd.columns
X = Xd.values

Td = df.iloc[:, 8:9]
T_names = Td.columns
T = Td.values

X.shape, X_names, T.shape, T_names

Out[30]:

((1030, 8),
 Index(['Cement', 'Blast Furnace Slag', 'Fly Ash', 'Water', 'Superplasticizer',
        'Coarse Aggregate', 'Fine Aggregate', 'Age'],
       dtype='object'),
 (1030, 1),
 Index(['Strength'], dtype='object'))

Before training your neural networks, partition the data into training and testing partitions, as shown here.

In [31]:

rows = np.arange(X.shape[0])
np.random.shuffle(rows)
ntrain = int(0.9 * len(rows))

Xtrain = X[rows[:ntrain], :]
Ttrain = T[rows[:ntrain], :]
Xtest = X[rows[ntrain:], :]
Ttest = T[rows[ntrain:], :]
    
print(f'Concrete: {Xtrain.shape=}, {Ttrain.shape=}, {Xtest.shape=}, {Ttest.shape=}')

Concrete: Xtrain.shape=(927, 8), Ttrain.shape=(927, 1), Xtest.shape=(103, 8), Ttest.shape=(103, 1)

Use your NeuralNetwork class to train a model that predicts the concrete strength from the eight input values. Experiment with a variety of neural network structures (numbers of hidden layer and units) including no hidden layers, learning rates, and numbers of epochs. Show results for at least three different network structures, learning rates, and numbers of epochs for a total of at least 27 results. Show your results in a pandas DataFrame with columns ('Structure', 'Epochs', 'Learning Rate', 'Train RMSE', 'Test RMSE').

Try to find good values for the RMSE on testing data. Discuss your results, including how good you think the RMSE values are by considering the range of concrete strength values given in the data.

Grading¶

Your notebook will be run and graded automatically. Test this grading process by first downloading A2grader.zip and unzip A2grader.py from it. Run the code in the following cell to demonstrate an example grading session. The remaining 20 points will be based on your discussion of this assignment.

A different, but similar, grading script will be used to grade your checked-in notebook. It will include additional tests. You should design and perform additional tests on all of your functions to be sure they run correctly before checking in your notebook.

For the grading script to run correctly, you must first name this notebook as A2solution.ipynb, and then save this notebook. Check in your A2solution.ipynb notebook when you are ready.

In [42]:

%run -i A2grader.py

======================= Code Execution =======================

Extracting python code from notebook named A2solution.ipynb and storing in notebookcode.py
Removing all statements that are not function or class defs or import statements.

Testing

    n_inputs = 3
    n_hiddens = [2, 1]
    n_outputs = 2
    n_samples = 5

    X = np.arange(n_samples * n_inputs).reshape(n_samples, n_inputs) * 0.1
    T = np.hstack((X, X*2))

    nnet = NeuralNetwork(n_inputs, n_hiddens, n_outputs)
    nnet.set_weights_for_testing()

    # Set standardization variables so use() will run
    nnet.X_means = 0
    nnet.X_stds = 1
    nnet.T_means = 0
    nnet.T_stds = 1
    
    Y = nnet.use(X)

Weights set for testing by calling set_weights_for_testing()

--- 20/20 points. Returned correct value.

Testing

    n_inputs = 3
    n_hiddens = []   # NO HIDDEN LAYERS.  SO THE NEURAL NET IS JUST A LINEAR MODEL.
    n_samples = 5

    X = np.arange(n_samples * n_inputs).reshape(n_samples, n_inputs) * 0.1
    T = np.hstack((X, X*2))
    n_outputs = T.shape[1]

    nnet = NeuralNetwork(n_inputs, n_hiddens, n_outputs)
    nnet.set_weights_for_testing()

    nnet.train(X, T, X, T, 1000, 0.01)
    Y = nnet.use(X)

Weights set for testing by calling set_weights_for_testing()

--- 20/20 points. Returned correct value.

Testing
    n_inputs = 3
    n_hiddens = [20, 20, 10, 10, 5]
    n_samples = 100

    X = np.arange(n_samples * n_inputs).reshape(n_samples, n_inputs) * 0.1
    T = np.log(X + 0.1)
    n_outputs = T.shape[1]
    
    Xtrain = X[np.arange(0, n_samples, 2), :]
    Ttrain = T[np.arange(0, n_samples, 2), :]
    Xtest = X[np.arange(1, n_samples, 2), :]
    Ttest = T[np.arange(1, n_samples, 2), :]

    def rmse(A, B):
        return np.sqrt(np.mean((A - B)**2))

    nnet = NeuralNetwork(n_inputs, n_hiddens, n_outputs)
    nnet.set_weights_for_testing()

    nnet.train(Xtrain, Ttrain, Xtest, Ttest, 6000, 0.01)
    Ytest = nnet.use(Xtest)
    err = rmse(Ytest, Ttest)
    print('RMSE', rmse(Ytest, Ttest))

Weights set for testing by calling set_weights_for_testing()

nbconvert: jupyter: command not found

RMSE 0.0760118437192574

--- 40/40 points. Returned correct value.

======================================================================
A2 Execution Grade is 80 / 80
======================================================================

___ / 10 Correctly ran the required experiments with results in a pandas dataframe.
___ / 10 Provided a sufficient description (at least 10 sentences) of your experiments and results.

======================================================================
A2 Experiments and Discussion Grade is __ / 20
======================================================================

======================================================================
A2 FINAL GRADE is  ___ / 100
======================================================================

Extra Credit:

Apply your functions to a data set from the UCI Machine Learning Repository.
Explain your steps and results in markdown cells.


A2 EXTRA CREDIT is 0 / 1

Extra Credit¶

Apply your multilayer neural network code to a regression problem using data that you choose from the UCI Machine Learning Repository or the Kaggle Datasets. Pick a dataset that is listed as being appropriate for regression.